home *** CD-ROM | disk | FTP | other *** search
- 1
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- EPISTAT
- Statistical Package
- for the IBM Personal Computer
-
- Version 2.1, 1983
-
-
-
-
-
- Written by:
-
- Tracy L. Gustafson, M.D.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- 2
-
-
-
-
-
- INTRODUCTION
-
-
- EPISTAT is a collection of programs written in BASICA for
- statistical analysis of small to medium-sized data samples ( < 1000
- observations per sample and < 28 data samples per file). It includes
- programs to ENTER, APPEND, and EDIT data, as well as perform several
- kinds of data TRANSFORMATIONS. The datafiles can be PRINTED, GRAPHED,
- or SAVED to disk. The 21 programs in EPISTAT can also perform 34
- common statistical tests or functions.
-
-
- The programs are intended to be as self-explanatory and user-
- friendly as possible. All questions can be answered with a number,
- a "Y" for yes, or an "N" for no. A thorough study of this guide is not
- necessary before using the programs. On the other hand, neither the
- programs nor this manual purport to TEACH the proper use or interpre-
- tation of statistics. Rather, some familiarity with the kinds of
- data required and the underlying assumptions appropriate to each
- statistical test is assumed.
-
-
- One will note that some of the programs emphasize epidemiologic
- and medical applications. Despite the wording of various program
- questions or statements, these test also apply to many other types of
- data. For further explanations of tests, refer to:
-
- 1. Colton, Theodore. Statistics in Medicine. Little, Brown and Co.
- Boston, 1974.
- 2. Fleiss, Joseph. Statistical Methods for Rates and Proportions.
- John Wiley and Sons. New York, 1973.
-
-
-
-
-
-
-
- CAVEAT:
- These programs have been tested extensively, but I cannot
- guarantee that they will work correctly with every possible data set
- or in every possible situation. Incorrect results are usually due
- to errors in the format or type of data entered. If you believe you
- have discovered a problem in the programs, please write me. I intend
- to fix any bugs that are brought to my attention.
- It is good practice to regularly compare the results obtained
- by programs in EPISTAT with results obtained by your previous method
- of calculation until you are familiar with each program. ANY
- unexpected result should be questionned and double-checked by
- reference to tables or another method of calculation.
-
-
-
-
-
-
-
-
-
-
- 3
-
-
-
- INDEX TO EPISTAT
- The following statistical tests and functions are available:
-
- TEST or FUNCTION PROGRAM NAME
- ---------------- ------------
- Analysis of variance (1-way)......................ANOVA
- Analysis of variance (2-way)......................ANOVA
- Bayes' theorem:
- False positive and false negative tests.......BAYES
- Probability of event given positive test......BAYES
- Binomial distribution.............................BINOMIAL
- Chi-square distribution...........................CHISQR
- Chi-square test...................................CHISQR
- Correlation coefficient (Pearson's)...............CORRELAT
- F distribution....................................ANOVA
- Fisher's exact test...............................FISHERS
- Linear regression analysis........................LNREGRES
- Mantel-Haenszel Chi-square test...................MHCHISQR
- Mantel-Haenszel for multiple controls.............MHCHIMLT
- McNemar's test....................................MCNEMAR
- Mean..............................................DATA-ONE
- Median............................................DATA-ONE
- Normal distribution...............................NORMAL
- Percent of values in given range..................NORMAL
- Poisson distribution..............................POISSON
- Random sample generator:
- Select sample from a population................RANDOMIZ
- Assign unpaired cases and controls.............RANDOMIZ
- Assign paired cases and controls...............RANDOMIZ
- Rank correlation (Spearman's).....................RANKTEST
- Rank sum test.....................................RANKTEST
- Rates adjusted, direct method.....................RATEADJ
- Rates adjusted, indirect method...................RATEADJ
- Sample size calculations:
- For estimating population rate.................SAMPLSIZ
- For unpaired case-control study................SAMPLSIZ
- For paired case-control study..................SAMPLSIZ
- Signed rank test..................................RANKTEST
- Standard deviation................................DATA-ONE
- Student's T-test (independent samples)............T-TEST
- Student's T-test (paired samples).................T-TEST
- T distribution....................................T-TEST
-
-
- In addition, the following data-handling capabilities are available:
-
- DATA MANIPULATION PROGRAM NAME
- ----------------- ------------
- Determine best test and program names.............EPISTAT
- Enter, append and edit data.......................DATA-ONE
- Graph data in histogram...........................HISTOGRM
- Print data (sorted or as entered).................DATA-ONE
- Perform data transformations......................LNREGRES
- Save data to disk file............................DATA-ONE
- Transfer data samples from one file to another....FILETRAN
-
-
-
-
-
-
-
- 4
-
-
-
-
- SYSTEM REQUIREMENTS FOR EPISTAT
-
- MINIMUM OPTIMAL
- IBM PC with 64K RAM IBM PC with 96K RAM
- One 160K disk drive Two disk drives
- Monochrome monitor Color graphics adapter
- BASICA Hi-res color monitor
- BASICA
- IBM or Epson printer
- with graphics
-
-
-
-
-
-
-
-
- EPISTAT - OVERALL PROGRAM DESCRIPTION
-
-
- All calculations in EPISTAT are performed using single precision.
- Although it may first appear that double precision would be more
- appropriate for statistical tests, "double" precision makes little or
- no real improvement in precision in these programs. Many of the
- algorithms used to evaluate p values use trigonometric functions which
- are calculated in single precision, anyway. Specifying double
- precision only serves to considerably slow the calculations. For
- best results, data entries should be numbers between 1E+7 and 1E-7.
- Larger or smaller numbers should be multiplied by an appropriate
- power of 10 before entry and analysis in EPISTAT.
-
-
- All EPISTAT programs are written so that as much pertinent
- information about the test as possible can fit on the final screen.
- This feature allows a summary printed copy to be produced simply by
- pressing <Shift-PrtSc>. This will work any time there is a pause in
- the program display. Three programs, "DATA-ONE", "HISTOGRM", and
- "RANDOMIZ", produce printed reports without using <Shift-PrtSC>. In
- these, simply follow program instructions to route output to your
- printer.
-
-
- EPISTAT is the introductory program in the EPISTAT package.
- DATA-ONE is the major data entry, editing, and printing program.
- Most of the programs in EPISTAT can evaluate data entered and saved
- using DATA-ONE. Many of the programs can, in addition, evaluate
- summary data entered without first using DATA-ONE. The programs
- marked with a star (*) in the individual descriptions that follow
- can evaluate raw data SAVED to disk with DATA-ONE. Non-starred
- programs provide their own data entry routines.
-
-
-
-
-
-
-
-
-
-
- 5
-
-
- INDIVIDUAL PROGRAM DESCRIPTIONS
-
-
- (1) "EPISTAT"
- This introductory program lists the available programs and aids
- the user in selecting the best statistical test for his or her data.
-
- (2) "DATA-ONE"
-
- DATA ENTRY:
- This is the central data entry program for the EPISTAT package.
- Initial data entry is accomplished by selecting option 1 and following
- the instructions to name each sample. Type in your numbers and
- press <Enter> twice after each entry. The maximum number of samples
- (S) in a datafile is 28 with a color and 7 with a monochrome adapter.
- The maximum number of records in each sample is 2000/S. A blank record
- can be entered if no data is available for a given cell (or if 2 samples
- with different numbers of observations are being entered) by pressing
- <Enter>, then Key F2. To exit the data entry mode, simply press <Enter>
- then key F10 following the last record. The mean, median and standard
- deviation are then calculated and displayed automatically.
- When you return to the main menu, choose option 5 (see below) to
- save your datafile to disk for future modification or use by other
- programs in the EPISTAT package.
- Although all entries in a datafile are treated as numbers by
- DATA-ONE, it is possible to enter character strings in a record. Such
- strings will be treated as zeros in all calculations. Nevertheless,
- when entering several samples, it often improves data readability to
- use the "Sample #1" column for names or identifying information about
- each ROW of data. Thus, DATA-ONE allows one to specify a name for
- each column and row in the datafile.
-
- DATA MODIFICATION:
- Option 2, APPEND, allows one to add more observations to a sample
- after initial data entry has been terminated. Option 3, EDIT, allows
- one to delete or replace incorrect data entries. Both of these options
- can be used to modify a datafile that has been loaded from disk. Of
- course, if you modify a datafile in any way, you will want to SAVE the
- modified datafile to disk again using Option 5.
-
- PRINTING DATA:
- To view or review a datafile, a printout to screen or printer can
- be obtained, Option 4. To print a datafile exactly as it was keyed in,
- request the printout in INPUT order. DATA-ONE has the additional
- capability to present the data SORTED in the order of any selected
- sample. Remember, only numeric data is sorted by DATA-ONE, so it will
- not alphabetize a character field. Further, sorted data will print
- only NON-BLANK records in the selected sort sample.
-
- SAVING DATAFILES and LOADING DATAFILES:
- Option 5, SAVE datafile, writes your data to disk in a sequential
- file for later editing, review, or use by another program. DATA MUST
- BE SAVED TO DISK before it can be used by other programs in EPISTAT.
- The name chosen for each DATAFILE must conform to the rules for IBM
- disk file names (see p. 3-36 in BASIC manual). If you have a 2-drive
- system, you will probably want put the EPISTAT disk in drive A: and
- SAVE datafiles on drive B. To do so, simply precede each datafile
- name with B: (e.g. B:TESTDATA). Note that file names entered in
- DATA-ONE do not need to be enclosed in quotation marks.
-
-
-
-
- 6
-
-
-
-
- (3) "ANOVA" *
-
- Provides ONE-WAY and TWO-WAY analysis of variance. ONE-WAY ANOVA
- compares the means of 3 or more samples. TWO-WAY ANOVA compares the
- combined effects of 2 variables on a third (ROW and COLUMN effects).
- All samples in TWO-WAY ANOVA must have the same number of elements.
- The program also provides for evaluation of a known F value.
-
- (4) "BAYES"
-
- Using Bayes' theorem, this program calculates the rates of false
- positive and false negative tests given differenct sensitivities and
- specificities and disease incidences. Using the formula in a different
- way, it can also calculate the prior probability of several diseases
- given a positive test.
-
- (5) "BINOMIAL"
-
- The binomial distribution allows calculation of the probability
- of a observed number compared to the expected. It assumes the variable
- is dichotomous and has an equal probability of occurring in each trial.
- This program calculates the ONE-TAILED probability of the entered
- number and all more extreme situations. For example, in the case of
- 2 heads in 10 tosses of a coin, the ONE-TAILED probability includes the
- sum of the probabilities for 0,1 and 2 heads out of 10 tosses.
-
- (6) "CHISQR"
-
- The Chi-square test evaluates either a table of data or a known
- chi-square value. 2 by 2 tables are automatically evaluated using
- Yates' correction. Tables larger than 15 by 10 cells will not fit
- on a single screen.
-
- (7) "CORRELAT" *
-
- Pearson's correlation coefficient assesses the correlation
- between paired samples. The probability of a given R value is
- evaluated using the T distribution.
-
- (8) "FISHERS"
-
- Fisher's exact test evaluates 2 by 2 tables of discrete variables.
- It is particularly valuable when the Chi-square test cannot be used
- because the expected value for a cell is < 5. However, this program
- can evaluate some tables where A+B+C+D > 200.
-
- (9) "HISTOGRM" *
-
- The histogram program graphs a data sample according to user
- specifications on the high resolution graphics screen. This screen
- image can be printed on an IBM or Epson printer with graphics features.
- To obtain a printed copy, simply press key F10 after the graph is
- displayed on screen. (Printing takes several minutes). If you do not
- want a printed copy, press key F1 to return to the program.
-
-
-
-
-
-
-
- 7
-
-
-
- (10) "LNREGRES" *
-
- Linear regression analysis calculates the least-squares regression
- line for paired samples. It then uses the T distribution to determine
- if the calculated slope is significantly different than zero. The
- program also allows the user to specify several types of data
- transformations prior to regression analysis. Transformed data
- samples can be saved to disk for future use (or printout).
-
- (11) "MHCHISQR"
-
- The Mantel-Haenszel Chi-square test evaluates the relationship
- between two discrete variables while controlling for the effect of
- a third variable.
-
- (12) "MHCHIMLT" *
-
- The Mantel-Haenszel Chi-square test for multiple controls compares
- one sample (the case sample) to 2 or more matched samples (control
- samples). The program can evaluate raw data input using DATA-ONE, if
- the data is entered as "1" for factor present, and "0" for factor
- absent in each case and control sample record. The program will also
- evaluate summary data entered per program instructions.
-
- (13) "MCNEMAR"
-
- McNemar's test, or the paired Chi-square test, evaluates 2 by 2
- tables of paired discrete variables. It compares discordant pairs
- (using Yates' correction) and calculates a probability that compares
- very well to the results of the binomial distribution.
-
- (14) "NORMAL" *
-
- The normal distribution has innumerable uses in statistics. This
- program specifically addresses three situations: First, it compares
- a sample mean to a population mean. Second, it calculates the
- proportion of samples that would be expected to fall in any given
- range under the normal curve. Third, it calculates the probability
- associated with any given value of z.
-
- (15) "POISSON"
-
- The Poisson distribution applies to dichotomous variables when
- the number of successes can be counted, but the number of failures
- cannot. It can also be used to approximate the binomial distribution
- when the number of trials is large (>100) and the expected rate is
- small (<5%). This program, like the Binomial program, calculates a
- ONE-TAILED probability.
-
- (16) "RANDOMIZ"
-
- This random sample generator aids in the selection of random
- samples for several purposes. It can provide a random subset of a
- larger population, or it can assign cases randomly to independent or
- paired groups for case-control studies.
-
-
-
-
-
-
-
- 8
-
-
-
-
- (17) "RANKTEST" *
-
- Three non-parametric tests of significance are performed by this
- program. They are appropriate for any sample which is clearly NOT
- normally distributed. They also specifically apply when quantitative
- variables are not available but qualitative ranks are. The RANK SUM
- TEST compares 2 independent samples. The SIGNED RANK TEST compares the
- medians of paired samples. Spearman's RANK CORRELATION calculates a
- correlation coefficient for paired samples. For the first two tests,
- the program calculates a TWO-TAILED exact probability associated with
- the various rank sums. Note that for samples larger than 20
- observations, the latter calculation can take several minutes.
-
- (18) "RATEADJ" *
-
- The rates adjustment program will adjust sample rates by either
- the direct or indirect methods. For DIRECT method adjustment, the
- datafile entered in DATA-ONE must include the study sample rates and
- the standard population figures. For INDIRECT method adjustment, the
- datafile used must include the study population figures and the
- standard population rates. After INDIRECT rate adjustment, the
- program will evaluate the probability of the observed number of cases
- using the Poisson distribution for small numbers, or the Chi-square
- distribution for large observed numbers.
-
- (19) "SAMPLSIZ" *
-
- The sample size program calculates the approximate sample sizes
- required to achieve statistical significance given certain specified
- levels of certainty. The following formulas are used:
-
- For a survey:
- N = [ z(a)*SQR(pi*(1-pi)) / d ] squared
- If N > 10% of entire population, then N' = N / (1+N/TP) .
-
- For a paired case-control study:
-
- N = [(z(a)*SQR(pi*(1-pi)) - z(b)*SQR(pi*(1-pi))) / (PT-pi) ] squared
-
- For an unpaired case-control study:
-
- [(z(a)*SQR(2*pi*(1-pi)) - z(b)*SQR(PT*(1-PT) + PC*(1-PC))]
- N = [-----------------------------------------------------------] squared
- (PT - PC)
-
-
- (20) "T-TEST" *
-
- The Student's T-Test compares the means of two samples. The
- program provides both the paired and unpaired T-Test calculations.
- The program will also evaluate a known T value.
-
-
-
-
-
-
-
-
-
-
- 9
-
-
-
- (21) "FILETRAN" *
-
- On occassion you may find that you want to compare 2 samples
- that are already entered in separate DATAFILES. Or you may have
- standard population figures in one datafile and sample rates to be
- adjusted in a different datafile. EPISTAT programs, however, only
- allow analysis of samples that are in a SINGLE DATAFILE. Rather than
- reenter one or both samples from keyboard, this file transfer program
- allows you to add a sample from DATAFILE #1 to any other DATAFILE #2.
- You may also create an entirely new datafile by selecting one sample
- from DATAFILE #1 and another from DATAFILE #2. Yet another option
- in FILETRAN is the ability to combine 2 samples into a single one by
- APPENDING one to the other. This utility program should make reentry
- of data unnecessary, regardless of the number of tests applied to it.
-
-
-
- NOTICE
-
- ---------------------------------------------------------------------
- Users may copy EPISTAT and distribute it to others on the following
- conditions:
- 1. The programs are not modified in any way.
- 2. Individual programs are not distributed separately.
- 3. No fee is charged for copying or distribution.
- ---------------------------------------------------------------------
-
-
- The concept of user-supported software is based on three
- principles:
-
- 1. The value and utility of a software (programs) are best assessed
- by each user on his or her own system with his or her own data.
- Only after using a program can one determine whether it serves
- one's personal applications, needs, and tastes.
-
- 2. The creation of independent personal computer software requires
- a substantial commitment of time and effort. Rather than
- duplicate this effort time after time, the computing community
- can and should support individual creative efforts.
-
- 3. The copying and networking of programs should be encouraged,
- not restricted. The entire computing community benefits when
- the burden of copy-protection is removed.
-
-
- If after using EPISTAT, you find it of value, your contribution
- in any amount will be appreciated ( $20 suggested ).
-
- Send contributions to:
-
- Tracy L. Gustafson, M.D.
- 1705 Gattis School Road
- Round Rock, Texas 78664
-
-
-
- Thank you, and good luck.